Goto

Collaborating Authors

 Masovia Province


Repurposing Language Models into Embedding Models: Finding the Compute-Optimal Recipe Bartosz Piotrowski University of Cambridge IDEAS NCBR IDEAS NCBR University of Warsaw IMPAN Wenda Li Mateja Jamnik

Neural Information Processing Systems

Text embeddings are essential for many tasks, such as document retrieval, clustering, and semantic similarity assessment. In this paper, we study how to contrastively train text embedding models in a compute-optimal fashion, given a suite of pre-trained decoder-only language models. Our innovation is an algorithm that produces optimal configurations of model sizes, data quantities, and fine-tuning methods for text-embedding models at different computational budget levels. The resulting recipe, which we obtain through extensive experiments, can be used by practitioners to make informed design choices for their embedding models. Specifically, our findings suggest that full fine-tuning and low-rank adaptation fine-tuning produce optimal models at lower and higher computational budgets respectively.


Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts Conversion Bartosz Wรณjcik IDEAS NCBR Warsaw University of Technology Jagiellonian University Mikoล‚aj Piรณrczyล„ski

Neural Information Processing Systems

Transformer models can face practical limitations due to their high computational requirements. At the same time, such models exhibit significant activation sparsity, which can be leveraged to reduce the inference cost by converting parts of the network into equivalent Mixture-of-Experts (MoE) layers. Despite the crucial role played by activation sparsity, its impact on this process remains unexplored. We demonstrate that the efficiency of the conversion can be significantly enhanced by a proper regularization of the activation sparsity of the base model. Moreover, motivated by the high variance of the number of activated neurons for different inputs, we introduce a more effective dynamic-k expert selection rule that adjusts the number of executed experts on a per-token basis. To achieve further savings, we extend this approach to multi-head attention projections. Finally, we develop an efficient implementation that translates these computational savings into actual wallclock speedup. The proposed method, Dense to Dynamic-k Mixture-of-Experts (D2DMoE), outperforms existing approaches on common NLP and vision tasks, reducing inference cost by up to 60% without significantly impacting performance.


Sparse is Enough in Scaling Transformers University of Warsaw Google Research Google Research OpenAI Wojciech Gajewski Henryk Michalewski Jonni Kanerva Google Research

Neural Information Processing Systems

Large Transformer models yield impressive results on many tasks, but are expensive to train, or even fine-tune, and so slow at decoding that their use and study becomes out of reach. We address this problem by leveraging sparsity. We study sparse variants for all layers in the Transformer and propose Scaling Transformers, a family of next generation Transformer models that use sparse layers to scale efficiently and perform unbatched decoding much faster than the standard Transformer as we scale up the model size. Surprisingly, the sparse layers are enough to obtain the same perplexity as the standard Transformer with the same number of parameters. We also integrate with prior sparsity approaches to attention and enable fast inference on long sequences even with limited memory.


Kolmogorov-Arnold networks for metal surface defect classification

arXiv.org Artificial Intelligence

Kolska 12, Warsaw 01-045, Poland Abstarct: This paper presents the application of Kolmogorov-Arnold Networks (KAN) in classifying metal surface defects. Specifically, steel surfaces are analyzed to detect defects such as cracks, inclusions, patches, pitted surfaces, and scratches. Drawing on the Kolmogorov-Arnold theorem, KAN provides a novel approach compared to conventional multilayer perceptrons (MLPs), facilitating more efficient function approximation by utilizing spline functions. The results show that KAN networks can achieve better accuracy than convolutional neural networks (CNNs) with fewer parameters, resulting in faster convergence and improved performance in image classification. In recent years, there has been a growing 1. Introduction Among the promising continuous advancements in neural network architectures alternatives to traditional Multilayer Perceptron (MLPs), significantly contributing to progress in the image Kolmogorov-Arnold Networks (KANs) leverage the classification field [1,2,3].


Fox News AI Newsletter: DC air defense gets major upgrade

FOX News

AI CAMERA SURVEILLANCE: The National Capital Region (NCR) is rolling out an advanced artificial intelligence-based visual recognition system that's taking air defense to a whole new level. THE FUTURE IS NOW: Autonomous, unmanned drones and artificial intelligence have already begun to shape the wars today and the future. Two US Air Force F-35 jets and a Polish Air Force F-16 take part in a military parade in Warsaw on Polish Army Day, August 15, 2023, to commemorate the anniversary of the 1920 victory over Soviet Russia at the Battle of Warsaw during the Polish-Soviet War. STAYING IN FIRST PLACE: As the U.S. races to maintain its global leadership in AI, much of the conversation revolves around natural language processing, the reshoring of the semiconductor supply chain and powering data centers. A visitor watches an AI (Artificial Intelligence) sign on an animated screen at the Mobile World Congress (MWC), the telecom industry's biggest annual gathering, in Barcelona.


Fox News AI Newsletter: Amazon's 4B bet on an AI startup

FOX News

Many people in Nashville say they don't trust artificial intelligence chatbots to give them unbiased information amid the backlash Google faces over its Gemini program. Businessman chatting through chatbot Online customer service with chat bots for support. AI INVESTMENT: Anthropic announced Friday that the company is receiving a 4 billion investment from Amazon to help advance the startup's efforts to develop artificial intelligence systems. Microsoft Bing Chat and ChatGPT AI chat applications are seen on a mobile device in this photo illustration in Warsaw, Poland, on July 21, 2023. SMART PLANNING: Here are a few ways to turn AI into your travel agent.


Google Earth lets you explore the past now, and Maps gets one of its biggest updates ever

ZDNet

Whether you're looking up the storefront of a specific restaurant or exploring a continent virtually, Google Maps and Google Earth are great tools for viewing a part of the world from your device. Now both applications are getting updates to make what you see even better. Also: 88% of US parents see AI as essential to their children's education On Wednesday, Google announced three new updates for Google Earth and Maps to enrich the user viewing and exploring experience. The first update is the availability of historical imagery on Google Earth, which allows users to explore Google's satellite and aerial imagery library from as far back as 80 years ago. Historical imagery showcases changes over time, especially in places like London, Berlin, Warsaw, and Paris, where you can see imagery from as far back as the 1930s.


Combining data from multiple sources for urban travel mode choice modelling

arXiv.org Artificial Intelligence

Demand for sustainable mobility is particularly high in urban areas. Hence, there is a growing need to predict when people will decide to use different travel modes with an emphasis on environmentally friendly travel modes. As travel mode choice (TMC) is influenced by multiple factors, in a growing number of cases machine learning methods are used to predict travel mode choices given respondent and journey features. Typically, travel diaries are used to provide core relevant data. However, other features such as attributes of mode alternatives including, but not limited to travel times, and, in the case of public transport (PT), also walking distances have a major impact on whether a person decides to use a travel mode of interest. Hence, in this work, we propose an architecture of a software platform performing the data fusion combining data documenting journeys with the features calculated to summarise transport options available for these journeys, built environment and environmental factors such as weather conditions possibly influencing travel mode decisions. Furthermore, we propose various novel features, many of which we show to be among the most important for TMC prediction. We propose how stream processing engines and other Big Data systems can be used for their calculation. The data processed by the platform is used to develop machine learning models predicting travel mode choices. To validate the platform, we propose ablation studies investigating the importance of individual feature subsets calculated by it and their impact on the TMC models built with them. In our experiments, we combine survey data, GPS traces, weather and pollution time series, transport model data, and spatial data of the built environment. The growth in the accuracy of TMC models built with the additional features is up to 18.2% compared to the use of core survey data only.


How scanning probe microscopy can be supported by Artificial Intelligence and quantum computing

arXiv.org Artificial Intelligence

How scanning probe microscopy can be supported by Artificial Intelligence and quantum computing? Institute of Fundamental Technological Research, Polish Academy of Sciences, Pawinskiego 5B, 02-106 Warsaw, Poland; aprego@ippt.pan.pl Abstract--The impact of Artificial Intelligence (AI) is expanding rapidly, revolutionizing both science and society. It is applied to practically all areas of life, science, and technology, including materials science, which continuously needs novel tools for effective materials characterization. One of the widely used techniques is scanning probe microscopy (SPM). SPM has fundamentally changed materials engineering, biology, and chemistry by delivering tools for atomic-precision surface mapping. Besides many advantages, it also has some drawbacks, eg. In this paper, we focus on the potential possibilities for supporting SPM-based measurements, putting emphasis on the application of AI-based algorithms, especially Machine Learning-based algorithms as well as quantum computing (QC). It turned out that AI can be helpful in the experimental processes automation in routine operations, the algorithmic search for good sample regions, and shed light on the structure-property relationships. Thus, it contributes to increasing the efficiency and accuracy of optical nanoscopy scanning probes. Moreover, the combination of AIbased algorithms and QC may have a huge potential to increase the practical application of SPM. The limitations of the AI-QC-based approach were also discussed. Finally, we outline a research path for the improvement of AI-QC-powered SPM. I. INTRODUCTION scanning near field optical microscopy (SNOM) are universal tools for materials' surface characterization. SPM enables to obtain a high-resolution 3D surface profile in a nondestructive measurement.


Characterizing Mechanisms for Factual Recall in Language Models

arXiv.org Artificial Intelligence

Language Models (LMs) often must integrate facts they memorized in pretraining with new information that appears in a given context. These two sources can disagree, causing competition within the model, and it is unclear how an LM will resolve the conflict. On a dataset that queries for knowledge of world capitals, we investigate both distributional and mechanistic determinants of LM behavior in such situations. Specifically, we measure the proportion of the time an LM will use a counterfactual prefix (e.g., "The capital of Poland is London") to overwrite what it learned in pretraining ("Warsaw"). On Pythia and GPT2, the training frequency of both the query country ("Poland") and the in-context city ("London") highly affect the models' likelihood of using the counterfactual. We then use head attribution to identify individual attention heads that either promote the memorized answer or the in-context answer in the logits. By scaling up or down the value vector of these heads, we can control the likelihood of using the in-context answer on new data. This method can increase the rate of generating the in-context answer to 88\% of the time simply by scaling a single head at runtime. Our work contributes to a body of evidence showing that we can often localize model behaviors to specific components and provides a proof of concept for how future methods might control model behavior dynamically at runtime.